-
Notifications
You must be signed in to change notification settings - Fork 19.7k
fix(openai): Respect 300k token limit for embeddings API requests #33668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(openai): Respect 300k token limit for embeddings API requests #33668
Conversation
- Add strict parameter to ChatDeepSeek class - Switch to Beta API endpoint when strict mode is enabled - Override bind_tools method to add strict: true to tool definitions - Add comprehensive tests for strict mode functionality Resolves langchain-ai#32670
- Add robust fallback for response serialization when model_dump() fails - Use model_dump_json() as fallback for non-OpenAI API responses - Improve null choices error message with debugging information - Add tests for vLLM-style responses and improved error messages Fixes langchain-ai#32252
- Add MAX_TOKENS_PER_REQUEST constant (300,000 tokens) - Implement dynamic batching in _get_len_safe_embeddings to respect token limits - Track actual token counts per chunk and batch accordingly - Apply same fix to async version _aget_len_safe_embeddings - Add test to verify token limit is respected with large document sets Fixes langchain-ai#31227
# Conflicts: # libs/partners/openai/langchain_openai/embeddings/base.py
CodSpeed Performance ReportMerging #33668 will not alter performanceComparing Summary
Footnotes
|
Description
Fixes #31227 - Resolves the issue where
OpenAIEmbeddingsexceeds OpenAI's 300,000 token per request limit, causing 400 BadRequest errors.Problem
When embedding large document sets, LangChain would send batches containing more than 300,000 tokens in a single API request, causing this error:
The issue occurred because:
embedding_ctx_length(8191 tokens per chunk)chunk_size(default 1000 chunks per request)1000 chunks × 8191 tokens = 8,191,000 tokens→ Exceeds limit!Solution
This PR implements dynamic batching that respects the 300k token limit:
MAX_TOKENS_PER_REQUEST = 300000chunk_sizebatches, accumulate chunks until approaching the 300k limit_get_len_safe_embeddingsand_aget_len_safe_embeddingsChanges
langchain_openai/embeddings/base.py:MAX_TOKENS_PER_REQUESTconstanttests/unit_tests/embeddings/test_base.py:test_embeddings_respects_token_limit()- Verifies large document sets are properly batchedTesting
All existing tests pass (280 passed, 4 xfailed, 1 xpassed).
New test verifies:
Usage
After this fix, users can embed large document sets without errors:
Resolves #31227